GLOWS - fix duplication issue#3175
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses duplicate GLOWS L1A histogram records by adding a deduplication step in generate_histogram_dataset, keeping only the first occurrence per (imap_start_time, imap_time_offset) pair and emitting a warning when duplicates are removed. It also adds a regression test intended to validate that the resulting dataset has unique time records.
Changes:
- Add deduplication of histogram records in
generate_histogram_datasetkeyed by IMAP start time and time offset. - Log a warning when duplicate histogram records are filtered out.
- Add an external-data-based test intended to verify deduplication behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
imap_processing/glows/l1a/glows_l1a.py |
Adds deduplication of histogram L1A records before building the xarray dataset. |
imap_processing/tests/glows/test_glows_l1a_cdf.py |
Adds a new external test asserting uniqueness of epochs in the generated histogram dataset. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @pytest.mark.external_test_data | ||
| def test_generate_histogram_dataset_deduplicates(in_flight_packet_path): | ||
| hist_l0, _ = decom_packets(in_flight_packet_path) | ||
| hist_l1a = [HistogramL1A(h) for h in hist_l0] | ||
| glows_attrs = create_glows_attr_obj() | ||
|
|
||
| dataset = generate_histogram_dataset(hist_l1a, glows_attrs) | ||
|
|
||
| epochs = dataset["epoch"].values.tolist() | ||
| assert len(epochs) == len(set(epochs)) | ||
|
|
||
|
|
| # Deduplicate by (imap_start_time, imap_time_offset), keeping the first occurrence. | ||
| seen_times: dict = {} | ||
| for hist in hist_l1a_list: | ||
| key = ( | ||
| hist.imap_start_time.seconds, | ||
| hist.imap_start_time.subseconds, | ||
| hist.imap_time_offset.seconds, | ||
| hist.imap_time_offset.subseconds, | ||
| ) | ||
| if key not in seen_times: | ||
| seen_times[key] = hist | ||
| dedup_hists = list(seen_times.values()) |
tech3371
left a comment
There was a problem hiding this comment.
I don't enough on the background. That's why I asked Copilot to review it. Since it didn't find any high priority recommendation, this change looks good to me. If you are able, I agree with copilot about updating test to reflect new changes tests.
4ca4f0a
into
IMAP-Science-Operations-Center:dev
This pull request adds logic to deduplicate histogram entries in the
generate_histogram_datasetfunction, ensuring that only the first occurrence of each unique(imap_start_time, imap_time_offset)pair is kept. Additionally, a new test verifies that the deduplication works as intended.Deduplication logic:
generate_histogram_dataset(inglows_l1a.py) that filters out duplicate histogram entries based onimap_start_timeandimap_time_offset, logging a warning if any duplicates are found.Testing:
test_generate_histogram_dataset_deduplicates(intest_glows_l1a_cdf.py) to ensure the deduplication logic works correctly by asserting that all epochs in the resulting dataset are unique.